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CLEAN VERSION OF REPLACEMENT PARAGRAPHS AND CLAIMS 

Please amend the specification as follows. A marked-up version of the amended 
specification appears in the section entitled "VERSION WITH MARKINGS TO 
SHOW CHANGES MADE" following this amendment. 

In the Specification: 

Please amend the specification as follows: 

Please delete the paragraph beginning on page 10, line 14 with "In each read..." and 
ending on page 10, line 21 with "...as similar methods." and replace with the 
following: 

In each read there may be some longest repeat. The repeat may be of a single base 
or a repetition of multiple bases. The algorithm may cluster each of the reads based 
on the longest repeat of a single base or the longest repeat of multiple bases. A 
repeating region is well known in the art. As an example, as shown in SEQ ID NO: 
1, the sequence "TAGAGAGAGAGAGATCATCGAT" contains a GA repeat from 
bases three through sixteen, which is this sequence's longest repeat. The categories 
of the present invention may be based on the GC percentage or any other 
percentage of the repeat as well as similar methods. 
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Please delete the paragraph beginning on page 10, line 23 with "Each read may 
have..." and ending on page 11, line 3 with "...is a G or a C," and replace with the 
following: 

Each read may have a long high or low entropy area within the read's sequence. 
This area will have some GC content. An example of a sequence with high entropy 
and high GC content is SEQ ID NO: 2, 

"GAGTGTATCTGCCCGCCGGCGTGCCCGGCTAC". The entropy is high because 
there is not an even distribution of A, G, C, and T. The GC percentage is high 
because over 70% of the sequence is a G or a C. 

Please amend the application by adding to the application, as a separate part of the 
disclosure, the content of the paper copy of the sequence listing, filed herewith. 
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REMARKS 



Pursuant to 37 C.F.R. 1.821(b), 37 C.F.R. 1.821(c), 37 C.F.R. 1.821(d) and 
M.P.E.P. 2422.02, Applicant submits this amendment to the specification in order to 
provide the sequence identifiers for the presented sequences. No new matter has been 
added to the appUcation. If the examiner has any questions or concerns, it would be 
appreciated if he or she would telephone the undersigned. 

The Commissioner is authorized to charge any deficiency or credit any 
overpayment in connection with this preliminary amendment to Deposit Account No. 
23-0035. 



Respectfully submitted, 




Registration No. 48,335 ^ 
WADDEY & PATTERSON 
A Professional Corporation 
Customer No. 23456 
ATTORNEY FOR APPLICANT 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 
In the Specification: 

Please amend the specification as follows: 

Please delete the paragraph beginning on page 10, fine 14 with "In each read..." and 
ending on page 10, fine 21 with "...as similar methods." and replace with the 
following: 

In each read there may be some longest repeat. The repeat may be of a single base 
or a repetition of multiple bases. The algorithm may cluster each of the reads based 
on the longest repeat of a single base or the longest repeat of multiple bases. A 
repeating region is well known in the art. As an example, as shown in SEQ ID NO: 
1, the sequence "TAGAGAGAGAGAGATCATCGAT" contains a GA repeat from 
bases three through sixteen, which is this sequence's longest repeat. The categories 
of the present invention may be based on the GC percentage or any other 
percentage of the repeat as well as similar methods. 

Please delete the paragraph beginning on page 10, line 23 with "Each read may 
have..." and ending on page 11, Une 3 with "...is a G or a C." and replace with the 
following: 
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Each read may have a long high or low entropy area within the read's sequence. 
This area will have some GC content. An example of a sequence with high entropy 
and high GC content is SEQ ID NO: 2, 

"GAGTGTATCTGCCCGCCGGCGTGCCCGGCTAC". The entropy is high because 
there is not an even distribution of A, G, C, and T. The GC percentage is high 
because over 70% [percent] of the sequence is a G or a C. 



Application No. 10/043,377 



CERTIFICATE OF FIRST CLASS MAILING 



I hereby certify that this Preliminary Amendment, paper copy of the sequence 
listing (1 page), and self addressed post card are being deposited with the United 
States Postal Service as first class mail in an envelope addressed to: 

Box Missing Parts 
Commissioner for Patents 
Washington, DC 20231, 

On October 1, 2002. 



Douglas W. SchelUng, Ph.D. 




Date 
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* SEQUENCE LISTING 

<110> Large Scale Biology Corporation 

<120> RECURSIVE CATEGORICAL SEQUENCE ASSEMBLY 

<130> 00801-02 11-NPUSOO 

<140> 10/043,377 
<141> 2002-01-11 

<160> 2 

<170> Patentin version 3.1 

<210> 1 

<211> 22 

<212> DNA 

<213> Unknown 

<220> 

<223> Ficticious example provided on page 10 of specification 
<400> 1 

tagagagaga gagatcatcg at 



<210> 2 

<211> 32 

<212> DNA 

<213> Unknown 

<220> 

<223> Ficticious example provided on page 11 of specification 

<400> 2 

gagtgtatct gcccgccggc gtgcccggct ac 
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